Multiplier cell and method of computing

ABSTRACT

An integrated cryptographic system ( 24 ) executes a mathematical algorithm that computes equations for public-key cryptography. An arithmetic processor ( 22 ) receives data values stored in a temporary storage memory ( 14 ) and computes both the Rivest-Shamir-Adleman (RSA) and Elliptic Curve Cryptography (ECC) algorithms. Multiplication cells ( 270  and  280 ) have an INT/POLY terminal that selects a C-register ( 246 ) for computing RSA modular exponentiation or ECC elliptic curve point multiplication.

This is a divisional application of U.S. application Ser. No.09/215,935, filed on Dec. 18, 1998.

BACKGROUND OF THE INVENTION

The present invention relates, in general, to public-key cryptographyand, more particularly, to a public-key cryptographic integratedcircuit.

Rivest-Shamir-Adleman (RSA) and Elliptic Curve Cryptography (ECC) arepublic-key cryptographic algorithms that provide high security fordigital data transfers between electronic devices. The modularmathematics of the RSA and ECC (Fp) algorithms can be computed on ahardware multiplier and the polynomial mathematics of the ECC (F2^(M) inpolynomial-basis) algorithm can be computed on a different hardwaremultiplier. Both hardware multiplier architectures that are used forcomputing the RSA and ECC algorithms can use pipelining techniques forthe massive parallel computations of the algorithms. The pipelinedmultiplier offers lower power which is required for many applications.

Hardware implementations for computing RSA and ECC algorithms is notstraight forward. Thus, the type of cryptography best suited for thesystem application defines the appropriate hardware multiplierarchitecture that computes the desired RSA or ECC algorithms. Withincreasing demand for faster cryptographic operations and higherperformance, hardware modular multiplier architecture improvements areneeded to ensure high levels of security.

Accordingly, it would be advantageous to provide cryptography in amultiplication system that achieves high performance, low cost, andlow-power for implementation in an integrated circuit. It would be afurther advantage for the multiplication system to compute the RSA andECC algorithms.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating one embodiment of an integratedcryptographic system having an RSA arithmetic processor and a separateECC arithmetic processor;

FIG. 2 is a block diagram illustrating another embodiment of anintegrated cryptographic system having a single processor for computingalgorithms for both RSA and ECC data cryptography;

FIG. 3 is a schematic diagram showing one embodiment of a portion of thesingle processor of FIG. 2;

FIG. 4 is a schematic diagram showing another embodiment of a portion ofthe single processor of FIG. 2;

FIG. 5 is a schematic diagram showing a portion of a multiplier forcomputing the ECC algorithm (F² ^(M) in the polynomial basis);

FIG. 6 is a block diagram that illustrates a 1×N multiplier forcomputing either the RSA or the ECC algorithm;

FIG. 7 is a schematic diagram of a cell used in the C-register of themultiplier of FIG. 6 for single-cycle multiplication operations; and

FIG. 8 is a schematic diagram of another cell used in the C-register ofthe multiplier of FIG. 6 for two-cycle multiplication operations.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

Generally, the present invention of an integrated cryptographic circuitprovides cryptographic functions that support Rivest-Shamir-Adleman(RSA) and Elliptic Curve Cryptography (ECC) algorithms. Thecryptographic integrated circuit has applications in internet commerce,paging, cellular phones, smartcards, and smartcard terminals, amongothers. Data, such as personal health records, financial records, fingerprints, and retina eye prints is encrypted using functions that includeinteger modular multiplications, modular polynomial multiplication,addition, subtraction, and exponentiation. The integrated cryptographiccircuit provides a hardware architecture that efficiently computes boththe RSA and the ECC algorithms.

FIG. 1 is a block diagram illustrating an embodiment of an integratedcryptographic system 10 having an RSA arithmetic processor 18 and aseparate ECC arithmetic processor 20. The single chip cryptographicsystem 10 is configured to operate in a data communication network andperform cryptographic functions using either the RSA or ECC algorithms.Cryptographic system 10 includes a host interface block 12 having aninput connected to an INTERFACE BUS. Data signals are transmitted andreceived via the INTERFACE BUS to/from other electronic devices (notshown) outside cryptographic system 10. By way of example, amicroprocessor, a Random Access Memory (RAM), a Read Only Memory (ROM),a Memory Access Controller (MAC), a Secure Memory Management Unit(SMMU), and a Universal Asynchronous Receive/Transmit (UART) block areelectronic devices external to cryptographic system 10 that provide andcontrol data at the terminals of Host Interface block 12. The blocksexternal to cryptographic system 10 are not shown in the figures.

Cryptographic system 10 further includes a temporary storage memory 14having an input connected to Host Interface block 12. Temporary storagememory 14 receives data values that allow cryptographic system 10 toperform public-key cryptographic functions. Thus, memory 14 stores thedata values that support the RSA modular exponentiation performed by RSAarithmetic processor 18 and, in addition, the data values that supportthe elliptic curve point multiplication performed by ECC arithmeticprocessor 20.

Specifically, for the RSA modular exponentiation, memory 14 stores datavalues such as a modulus value N, operand values A and B, exponentvalues, and partial product values. In addition, for the ECC ellipticcurve point multiplication, memory 14 stores data values such as anirreducible polynomial, a value for odd prime fields, an ECC system-wideparameter for the generator point, elliptic curve coefficients, a pointscalar, and temporary values.

Typically, the storage capacity of memory 14 roughly supports a four toone key size ratio of RSA to ECC. For example, if the memory supportedan RSA key size of 1024 bits, then the same memory could approximatelysupport an ECC key size of up to 256 bits. Thus, memory 14 provides fora lower level of security when using the RSA algorithm compared to usingthe ECC algorithm. By using memory 14 to store data values for both RSAarithmetic processor 18 and ECC arithmetic processor 20, the siliconarea and total cost of cryptographic system 10 is reduced.

Similar types of software instructions can be used for computing boththe ECC and RSA algorithms. By way of example, the RSA algorithm usesthe binary square-and-multiply routine in computing exponentialfunctions while the ECC algorithm uses the double-and-add routine in thecomputation of point multiplies. Thus, similar software routines areused to support mathematical operations using either the RSA or ECCalgorithm. Similarities can also be found between multiplies of therespective algorithms, e.g., integer modulo-N for RSA and modularmultiplies in the polynomial-basis for ECC.

In operation, the data values stored in memory 14 are transferred to RSAarithmetic processor 18 or to ECC arithmetic processor 20. A controlcircuit 16 provides control signals that manage the transfer of datavalues between memory 14, RSA arithmetic processor 18, ECC arithmeticprocessor 20, and Host Interface block 12. In addition, the controlsignals generated in control block 16 control the mathematicalcomputations that are provided by RSA arithmetic processor 18 and ECCarithmetic processor 20 in the processing of data. Put another way, acontrol signal from control block 16 enables RSA arithmetic processor 18for computing the RSA algorithm or ECC arithmetic processor 20 forcomputing the ECC algorithm. The similarities that exist between the RSAand ECC algorithms reduce the number of control signals generated bycontrol circuit 16.

FIG. 2 is a block diagram illustrating another embodiment of anintegrated cryptographic system 24 having a single arithmetic processor22 for computing RSA and ECC algorithms. It should be noted that thesame reference numbers are used in the figures to denote the sameelements. This embodiment of cryptographic system 24 connects otherelectronic devices (not shown) to host interface block 12 through anINTERFACE BUS. Data signals are transferred through Host Interface block12 to temporary storage memory 14 for storing data. Control circuit 16provides control signals to arithmetic processor 22 that manage thetransfer of data values from temporary storage memory 14 and control thefunctions provided by arithmetic processor 22. One such control signalgenerated by control circuit 16 is the INT/POLY signal that selects orenables arithmetic processor 22 to generate the mathematical operationsof the RSA algorithm and the ECC algorithm. Thus, arithmetic processor22 provides cryptographic functions based either on RSA modularexponentiation or ECC elliptic curve point multiplication.

FIG. 3 is a schematic diagram showing one embodiment of a portion of thesingle arithmetic processor 22 of FIG. 2. Arithmetic processor 22performs a multiplication of operands A and B and supplies a productvalue, i.e., P_(i+0), P_(i+1), P_(i+2), and P_(i+3), to a modulo reducer60. Operands A and B can be numerical data or plain text strings thatare converted to ordinal numbers using American Standard Code forInformation Interchange (ASCII) or other transformed character sets.

Modulo reducer 60 of arithmetic processor 22 includes an adder arrayhaving X columns and Y rows, where X and Y are integer numbers. Thepreferred embodiment of the adder array has sixteen columns and sixteenrows. However, it should be noted that the present invention is notlimited to an adder array having sixteen columns and sixteen rows or toan array having matching numbers of rows and columns. Modulo reducer 60is described in simplified form for simplicity and illustrative purposesas being a four by four array of adders along with associated logic.

Adders 90, 92, 94, and 96 are in column X₀, adders 100, 102, 104, and106 are in column X₁, adders 110, 112, 114, and 116 are in column X₂,and adders 120, 122, 124, and 126 are in column X₃ of the adder array ofmodulo reducer 60. Adders 90-96, 100-106, 110-116, and 120-126 each havefirst and second data inputs, a carry input (CI), a carry output (CO),and a sum output (S).

The first inputs of adders 90, 92, 94, and 96 in column X₀ are connectedto respective input terminals 80, 82, 84, and 86. Two input AND-gates89, 91, 93, and 95 each have a first input commonly connected to eachother and to a Q output of a latch 128. The outputs of AND-gates 89, 91,93, and 95 are connected to the second inputs of adders 90, 92, 94, and96, respectively. In addition, a carry output (CO) of adder 90 iscoupled through an AND-gate 90A to a carry input (CI) of adder 92, acarry output of adder 92 is coupled through an AND-gate 92A to a carryinput of adder 94, and a carry output of adder 94 is coupled through anAND-gate 94A to a carry input of adder 96. The carry output of adder 96is coupled through an AND-gate 96A to a data input of a latch 152. Theoutput of latch 152 is connected to the carry input of adder 90.

Logic gates such as, for example, AND-gates 90A, 92A, 94A, and 96A arealso referred to as blocking circuits. When the select or enable signalcommon to all of the blocking circuits has a logic one value, then thecarryin signal is transferred through the blocking circuit. On the otherhand, when the select or enable signal has a logic zero value, then thecarryin signal is blocked or inhibited from propagating through theblocking circuit.

The first inputs of adders 100, 102, 104, and 106 in column X₁ areconnected to the respective outputs of adders 90, 92, 94, and 96 incolumn X₀. Two input AND-gates 99, 101, 103, and 105 have a first inputcommonly connected to each other and to a Q output of a latch 132. Theoutputs of AND-gates 99, 101, 103, and 105 are connected to the secondinputs of adders 100, 102, 104, and 106, respectively. In addition, acarry output of adder 100 is coupled through an AND-gate 100A to a carryinput of adder 102, a carry output of adder 102 is coupled through anAND-gate 102A to a carry input of adder 104, and a carry output of adder104 is coupled through an AND-gate 104A to a carry input of adder 106.The carry output of adder 106 is coupled through an AND-gate 106A to adata input of a latch 156. The output of latch 156 is connected to thecarry input of adder 100.

The first inputs of adders 110, 112, 114, and 116 in column X₂ areconnected to the respective outputs of adders 100, 102, 104, and 106 incolumn X₁. Two input AND-gates 109, 111, 113, and 115 have a first inputcommonly connected to each other and to a Q output of a latch 136. Theoutputs of AND-gates 109, 111, 113, and 115 are connected to the secondinputs of adders 110, 112, 114, and 116, respectively. In addition, acarry output of adder 110 is coupled through an AND-gate 110A to a carryinput of adder 112, a carry output of adder 112 is coupled through anAND-gate 112A to a carry input of adder 114, and a carry output of adder114 is coupled through an AND-gate 114A to a carry input of adder 116.The carry output of adder 116 is coupled through an AND-gate 116A to adata input of a latch 160. The output of latch 160 is connected to thecarry input of adder 110.

The first inputs of adders 120, 122, 124, and 126 in column X₃ areconnected to the respective outputs of adders 110, 112, 114, and 116 incolumn X₂. Two input AND-gates 119, 121, 123, and 125 have a first inputcommonly connected to each other and to a Q output of a latch 140. Theoutputs of AND-gates 119, 121, 123, and 125 are connected to the secondinputs of adders 120, 122, 124, and 126, respectively. In addition, acarry output of adder 120 is coupled through an AND-gate 120A to a carryinput of adder 122, a carry output of adder 122 is coupled through anAND-gate 122A to a carry input of adder 124, and a carry output of adder124 is coupled through an AND-gate 124A to a carry input of adder 126.The carry output of adder 126 is coupled through an AND-gate 126A to adata input of a latch 162. The output of latch 162 is connected to thecarry input of adder 120. The output S of adders 120, 122, 124, and 126are connected to respective output terminals 164, 166, 168, and 170.AND-gates 90A-96A, 100A-106A, 110A-116A, and 120A-126A are enabled whenarithmetic processor 22 is computing integer-modulo-N multiplicationsand not enabled when the arithmetic processor is computing modularpolynomial-basis multiplications. In other words, the carryout signal ofrespective adders 90-96, 100-106, 110-116, and 120-126 is not propagatedwhen the modular polynomial-basis multiplications algorithm is beingcomputed. The letter “A” has been appended to the reference number ofthe AND-gates to signify that each adder, such as adder 90, has acorresponding AND-gate, i.e., 90A, that either passes or blocks thecarry output of that adder from being transferred to the carry input ofan adjacent adder.

Further, the second inputs of AND-gates 89, 101, 113, and 125 arecommonly connected to each other and to input terminal 81. The secondinputs of AND-gates 91, 103, and 115 are commonly connected to eachother, to an input of a latch 158, and to input terminal 83. The secondinputs of AND-gates 93 and 105 are commonly connected to each other, toan input of a latch 154, and to input terminal 85. The second input ofAND-gate 95 is commonly connected to an input of a latch and to inputterminal 87. The second inputs of AND-gates 99, 111, and 123 arecommonly connected to each other and to an output of latch 150. Thesecond inputs of AND-gates 109 and 121 are commonly connected to eachother and to an output of latch 154. The second input of AND-gate 119 isconnected to an output of latch 158.

Latches 128, 132, 136, and 140 each have a set input (S), a reset input(R), and an output (Q). Latches 128, 132, 136, and 140 are enabled whensignal T is high causing the signal at output Q to have the same valueas the signal at input S. The signals at the Q outputs are latched whenthe signal T transitions from a high to a low logic value. The signal atinput R resets the signals at the Q outputs. The reset inputs R oflatches 128, 132, 136, and 140 are commonly connected to each other andto a terminal 79. Terminal 79 is coupled for receiving a reset signal R.A two input AND-gate 130 has an output connected to the set input oflatch 128. The first input of AND-gate 130 is connected to the firstinput of adder 90. A two input AND-gate 134 has an output connected tothe set input of latch 132. The first input of AND-gate 134 is connectedto the first input of adder 102. A two input AND-gate 138 has an outputconnected to the set input of latch 136. The first input of AND-gate 138is connected to the first input of adder 114. A two input AND-gate 142has an output connected to the set input of latch 140. The first inputof AND-gate 142 is connected to the first input of adder 126. The secondinputs of AND-gates 130, 134, 138, and 142 are commonly connected toeach other and to terminal 78. Terminal 78 is coupled for receiving asignal T.

Large operands such as, for example, two 1024 bit operands aremultiplied using pipelining techniques and multiple passes or rotationsthrough a multiplier (not shown). Typically, the larger operands A and Bare segmented into smaller groups that are referred to as digits, e.g.,digits A₀-A_(N) and B₀-B_(N). The pipelined multiplier has an array sizethat is appropriate for multiplying the digits. By way of example, thedigits A₀-A_(N) and B₀-B_(N) are 16 bit binary numbers and themultiplier is a 16 bit multiplier, although this is not a limitation ofthe present invention.

In general, integer-modulo-N Montgomery multiplications take the formof:

(A*R mod N)(B*R mod N)+μ*N

where:

A is the first operand and an integer;

B is the second operand and an integer;

N is an integer having an odd value;

mod N is a remainder value of (A*B*R)/N that defines the number ofelements in the finite field;

R is an integer power of two number having a value greater than thevalue of N; and

μ is a reduction value that is computed such that (A*R mod N)(B*R modN)+μ*N is an integer that can be divided by R without a loss ofsignificant bits.

In operation, modulo reducer 60 receives the product of (A*R mod N) and(B*R mod N) and generates reduced partial product outputs forinteger-modulo-N multiplications. For simplicity and illustrativepurposes integer-modulo-N multiplications are described using thefollowing example for four-bit numbers. Referring to FIG. 3, inputterminals 80, 82, 84, and 86 receive the respective product termsP_(i+0), P_(i+1), P_(i+2), and P_(i+3) that result from multiplyingoperands such as, for example, operands A₀ and B₀. In addition, inputterminals 81, 83, 85, and 87 receive the values N_(i+0), N_(i+1),N_(i+2), and N_(i+3), i.e., values for the integer N. Modulo reducer 60generates a reduced product term for modular multiplication at outputterminals 164-170.

Modulo reducer 60 implements the Foster-Montgomery Reduction Algorithm.In the Foster-Montgomery Reduction Algorithm the logic values atparticular bit locations determine whether the value of N is aligned andadded to a summed value. The architecture of modulo reducer 60 allowsthe value of N to both be aligned and added to the summed value when thelogic value at a particular bit location has a logic one value. Byaligning and adding the value of N, the value of μ is determined andstored in latches 128, 132, 136, and 140. In other words, the value of μis determined during the reduction process that generates the reducedproduct term at output terminals 164-170 and not prior to themultiplication of digits A₀ and B₀.

An example is described where the term (A*R mod N) has the value of 0001when using base two numbers and A₁₀=9, R₁₀=16, and N₁₀=13. Further, theterm (B*R mod N) has the value of 0111 when B₁₀=11, R₁₀=16, and N₁₀=13.Note that operands A₀ and B₀ are pre-multiplied by R for Montgomerymultiplication to simplify a hardware modular reduction problem. Whenthe operands (A*R mod N) and (B*R mod N) are multiplied, the productterms, P_(i+3), P_(i+2), P_(i+1), and P_(i+0) have the respective valueof 0111.

Initially, a reset signal at terminal 79 causes the Q outputs of latches128, 132, 136, and 140 to have logic zero values. AND-gate 130 receivesthe product term P_(i+0), having a logic one value, at one input and thesignal T, having a logic one value, at the other input. The output ofAND-gate 130 generates a logic one value that causes latch 128 to set,i.e., the signal at the Q output has a logic one value. It should benoted that the signal T has a logic one value during the time thatoperands A₀ and B₀, i.e., the lower order digits of operands A and B,are multiplied together. It should be further noted that the logic onevalue at the Q output of latch 128 causes AND-gates 89, 91, 93, and 95to be enabled and pass the values N_(i+0), N_(i+1), N_(i+2), and N_(i+3)to the second inputs of adders 90, 92, 94, and 96, respectively. Thus,the adders located in column X₀ generate output signals that are the sumof the values N_(i+0), N_(i+1), N_(i+2), and N_(i+3) and thecorresponding values of P_(i+0), P_(i+1), P_(i+2), and P_(i+3).

The logic one values at the first and second inputs of adder 90 causeoutput S to supply a logic zero value. Further, adder 90 generates acarry signal at output CO. Adder 92 receives a logic one value at thefirst input, a logic zero value at the second input, and a logic onevalue for the carry signal at input CI. The signal at output S of adder92 has a logic zero value and the carry signal at output CO has a logicone value.

Adder 94 receives a logic one at the first input, a logic one at thesecond input from AND-gate 93, and a carry signal enabled throughAND-gate 92A. The output S of adder 94 has a logic one value and thecarryout signal has a logic one value at the carry output CO. Likewise,adder 96 receives a logic zero at the first input, a logic one at thesecond input from AND-gate 95, and a carry signal enabled throughAND-gate 94A. The output signal at output S of adder 96 has a logic zerovalue and the carry signal at the carry output CO has a logic one value.In accordance with the Foster-Montgomery Reduction Algorithm, theparticular bit location having a logic one value, i.e., the leastsignificant bit location at input terminal 80, causes the value N to bealigned and added to the value P.

Again, according to the Foster-Montgomery Reduction Algorithm, the datagenerated by the adders in column X₁ have values that depend on the dataat a particular data bit location. The particular data bit location inthis instance corresponds with the output S of adder 92. It should benoted that an input of AND-gate 134 receives a logic zero value from thesignal at output S of adder 92. Latch 132 is not set and the Q output oflatch 132 remains a logic zero value. AND-gates 99, 101, 103, and 105generate a logic zero value at the second inputs of adders 100, 102,104, and 106, respectively. Adder 100 has logic zero values at both thefirst and second inputs and generates a logic zero value at output S.Likewise, adder 102 has logic zero values at both the first and secondinputs and generates a logic zero value at output S. Adder 104 has alogic one value at the first input and a logic zero value at the secondinput and generates a logic one value at output S. Adder 106 has logiczero values at both the first and second inputs and generates a logiczero value at output S. Thus, adders 106, 104, 102, and 100 in column X₁generate a respective value of 0100.

The data generated by the adders in column X₂ have values that alsodepend on the data at a particular data bit location. The particulardata bit in this instance is the logic value at the output of adder 104.It should be noted that an input of AND-gate 138 receives a logic onevalue from the signal at output S of adder 104. The logic one value atthe output of AND-gate 138 causes latch 136 to set and the Q output oflatch 136 to have a logic one value. AND-gates 109, 111, 113, and 115are enabled by the logic one value generated by latch 136. Thus, thedata at the outputs of adders 100, 102, 104, and 106 is transferred tothe second inputs of adders 110, 112, 114, and 116, respectively. Adder110 has logic zero values at both the first and second inputs andgenerates a logic zero value at output S. Likewise, adder 112 has logiczero values at both the first and second inputs and generates a logiczero value at output S. Adder 114 has logic one values at both the firstand second inputs and generates a logic zero value at output S and alogic one value for the carryout signal at output CO. Adder 116 haslogic zero values at both the first and second inputs and a logic onevalue is transferred through AND-gate 114A to the carry input of adder116. A logic one value is generated at output S of adder 116. Thus,adders 116, 114, 112, and 110 in column X₂ generate a respective valueof 1000.

The data generated by adders 120, 122, 124, and 126 in column X₃ havevalues that also depend on the data at a particular data bit location.The particular data bit in this instance is the logic value at theoutput of adder 116. An input of AND-gate 142 receives a logic one valuefrom the signal at output S of adder 116. And gate 142 having a logicone value from adder 116 and a logic one value for the signal T causeslatch 140 to set. The Q output of latch 140 has a logic one value whichenables AND-gates 119, 121, 123, and 125. The data at the outputs ofadders 110, 112, 114, and 116 is transferred to the first inputs ofadders 120, 122, 124, and 126, respectively. Adder 120 has logic zerovalues at both the first and second inputs and generates a logic zerovalue at output S. Likewise, adder 122 has logic zero values at both thefirst and second inputs and generates a logic zero value at output S.Adder 124 also has logic zero values at both the first and second inputsand generates a logic zero value at output S. Adder 126 has logic onevalues at both the first and second inputs and generates a logic zerovalue at output S and a logic one value as the carryout signal at thecarry output. Thus, adders 126, 124, 122, and 120 in column X₃ generatea respective value of 0000 at output terminals 164-170.

During the reduction process that causes the first partial product of A₀and B₀ to have a value of zero, the appropriate latches 128, 132, 136,and 140 have been set and contain the value 1101 for μ that is used insubsequent pipelined multiplications. Following the reduction of thefirst partial product to zero, the signal T transitions from a logic oneto a logic zero value and stores the value of μ in latches 128, 132,136, and 140. The stored value of μ, the next digit of N, and theproducts of the digits B₁-B₆₃ with A₁-A₆₃ are used by modulo reducer 60to complete the polynomial multiplication.

FIG. 4 is a schematic diagram showing a multiplier structure 171 as aportion of another embodiment of single arithmetic processor 22 of FIG.2. Multiplier structure 171 performs mathematical operations in supportof integer-modulo-N multiplications and modular polynomial-basismultiplications. Multiplier structure 171 is described in simplifiedform for simplicity and illustrative purposes as being a four by fourarray of adders. Although multiplier structure 171 is described as anarray of adders having the same number of rows and columns, this is nota limitation of the present invention.

Multiplier structure 171 has adders 90, 92, 94, and 96 in column X₀,adders 100, 102, 104, and 106 in column X₁, adders 110, 112, 114, and116 in column X₂, and adders 120, 122, 124, and 126 in column X₃. Inaddition, latches 152, 156, 160, and 162 store carryout signals that areused in computing integer-modulo-N multiplications for generating thenext partial product. Latches 150, 154, and 158 store data bits ofoperand B, and latches 226, 228, and 230 store data bits of the value Nfor use in generating the next partial product.

The multiplexers (muxes) in multiplier structure 171 each have fourinputs, an output, and two selector inputs. Multiplexers 172-178,182-188, 192-198, and 202-208 are illustrated as having outputsconnected to the first input of the adders, although, it should be notedthat the outputs of the multiplexers could be connected to the secondinputs of the adders. The signals on the first and second selectorinputs of the muxes select a signal at one of the four mux inputs fortransfer to the mux output. The output signals from muxes 172-178 aretransferred to the first input of adders 90-96, respectively. The outputsignals from muxes 182-188 are transferred to the first input ofrespective adders 100-106. The output signals from muxes 192-198 aretransferred to the first input of adders 110-116, respectively. Theoutput signals from muxes 202-208 are transferred to the first input ofadders 120-126, respectively.

Further, the first selector inputs of muxes 172-178 are commonlyconnected to each other and receive the signal A_((BIT 0)). The secondselector inputs of muxes 172-178 are commonly connected to each otherand to an output of a latch 212. The first selector inputs of muxes182-188 are commonly connected to each other and receive the signalA_((BIT 1)). The second selector inputs of muxes 182-188 are commonlyconnected to each other and to an output of a latch 216. The firstselector inputs of muxes 192-198 are commonly connected to each otherand receive the signal A_((BIT 0)). The second selector inputs of muxes192-198 are commonly connected to each other and to an output of a latch220. The first selector inputs of muxes 202-208 are commonly connectedto each other and receive the signal A_((BIT 3)). The second selectorinputs of muxes 202-208 are commonly connected to each other and to anoutput of a latch 224.

A first input of muxes 172-178, 182-188, 192-198, and 202-208 iscommonly coupled for receiving a logic zero value. The second input ofmuxes 172, 174, 176, and 178 receive the respective values B_((BIT 0)),B_((BIT 1)), B_((BIT 2)), and B_((BIT 3)). The third inputs of muxes172, 174, 176, and 178 receive the respective values of N_((BIT 0)),N_((BIT 1)), N_((BIT 2)), and N_((BIT 3)) The fourth inputs of muxes172, 174, 176, and 178 receive the summed value of the respective valuesfor N and B. Thus, the fourth input of each mux receives the logicalsummed value of the values supplied at the second and third inputs ofthat mux.

When the first and second selector inputs of the muxes receiverespective logic values of 00, the signals at the first inputs of muxes172-178, 182-188, 192-198, and 202-208 are transferred to the outputs ofthe corresponding muxes. When the first and second selector inputsreceive respective logic values of 01, the signals at the second inputsof muxes 172-178, 182-188, 192-198, and 202-208 are transferred to theoutputs of the corresponding muxes. When the first and second selectorinputs receive respective logic values of 10, the signals at the thirdinputs of muxes 172-178, 182-188, 192-198, and 202-208 are transferredto the outputs of the corresponding muxes. When the first and secondselector inputs receive respective logic values of 11, the signals atthe fourth inputs of muxes 172-178, 182-188, 192-198, and 202-208 aretransferred to the outputs of the corresponding muxes.

Latches 212, 216, 220, and 224 latch a data signal from respective logiccircuits 210, 214, 218, and 222 when the signal T transitions from alogic one to a logic zero value. The data signal generated by logiccircuit 210 is the product of the signals A(BIT 0) and B(BIT 0)exclusive or'ed with P(0), where P(0) is the least significant bit ofthe previous partial product value. The data signal generated by logiccircuit 214 is the product of the signals A(BIT 1) and B(BIT 0)exclusive or'ed with the output signal from adder 92. The data signalgenerated by logic circuit 218 is the product of the signals A(BIT 2)and B(BIT 0) exclusive or'ed with the output signal from adder 104. Thedata signal generated by logic circuit 222 is the product of the signalsA(BIT 3) and B(BIT 0) exclusive or'ed with the output signal from adder116.

AND-gates 90A- 96A are located in the carry chain path of the adders incolumn X₀. Thus, AND-gates 90A-96A either enable or disable signals frompropagating in the carry chain of column X₀. Likewise, AND-gates100A-106A are located in the carry chain path of the adders in column X₁and either enable or disable signals from propagating in the carry chainof column X₁. AND-gates 110A-116A are located in the carry chain path ofthe adders in column X₂ and either enable or disable signals frompropagating in the carry chain of column X₂. AND-gates 120A-126A arelocated in the carry chain path of the adders in column X₃ and eitherenable or disable signals from propagating in the carry chain of columnX₃. Each AND-gate 90A-96A, 100A-106A, 110A-116A, and 120A-126A isenabled when multiplier structure 171 is computing integer-modulo-Nmultiplications and disabled when multiplier structure 171 is computingmodular polynomial-basis multiplications. In other words, the carrychain paths of multiplier structure 171 only propagate carry chainsignals to adjacent adder cells when integer-modulo-N multiplicationsare being computed.

The multiplication process that generates the partial product of digitsA₀ and B₀ causes the logic values at output terminals 164-170 to bereduced. Thus, the partial product that results from digit A₀ timesdigit B₀ has all logic zero values. In addition, latches 128, 132, 136,and 140 have been appropriately set and store the value for μ during themultiplication of A₀ and B₀. During subsequent multiply operations, thestored value of p, along with corresponding values of N₁-N₆₃, digitsB₁-B₆₃, and digits A₁-A₆₃ are used by multiplier structure 171 tocomplete the mathematical computations for integer-modulo-Nmultiplications.

Referring to FIG. 4, the following example uses the arithmetic processfor modular polynomial multiplication. The Montgomery ReductionAlgorithm for polynomial multiplication takes the form of:

(A*R mod N)(B*R mod N)+μ*N

where:

A is the first operand and a polynomial;

B is the second operand and a polynomial;

N is an irreducible polynomial;

mod N is a remainder value of (A*B*R)/N that defines the number ofelements in the finite field;

R is an integer power of two number having a value greater than thevalue of N; and

μ is a reduction value that is computed such that (A*R mod N)(B*R modN)+μ*N is an integer that can be divided by R without a loss ofsignificant bits.

An example is described where the term (A*R mod N) has the value of(x⁶+x⁴) mod N=x+1 or 011 when using base two numbers and A=5 (base ten)or (x+1) in polynomial form, R=16 (base ten) or (x⁴) in polynomial form,and N=11 (base ten) or (x³+x+1) in polynomial form. Further, the term(B*R mod N) has the value of 101 or (x⁶) mod N=x+1 in polynomial formwhen B=4 (base ten), R=16 (base ten), and N=11 (base ten). Note thatdigits A₀ and B₀ are pre-multiplied by R to simplify a hardware modularreduction problem. When the operands (A*R mod N) and (B*R mod N) aremultiplied, the product terms, P_(i+3), P_(i+2), P_(i+1), and P_(i+0)have the respective value of 1111. Multiplier structure 171 reduces theproduct of [(A*R mod N)*(B*R mod N)] mod N by R, which results in avalue of 0111 or (x²+x+1) in polynomial form.

Initially, a reset signal at terminal 79 causes the Q outputs of latches128, 132, 136, and 140 to have logic zero values. AND-gate 130 receivesthe product term P_(i+0), having a logic one value, at one input and thesignal T, having a logic one value, at the other input. The output ofAND-gate 130 generates a logic one value that causes latch 128 to set,i.e., the signal at the Q output has a logic one value. It should benoted that the signal T has a logic one value during the time thatdigits A₀ and B₀, i.e., the lower order segment of operands A and B, aremultiplied together. It should be further noted that the logic one valueat the Q output of latch 128 causes AND-gates 89, 91, 93, and 95 to beenabled and pass the values N_(i+0), N_(i+1), N_(i+2), and N_(i+3) tothe second inputs of adders 90, 92, 94, and 96, respectively. Thus, theadders located in column X₀ generate output signals that are the sum ofthe values N_(i+0), N_(i+1), N_(i+2), and N_(i+3) and the correspondingvalues of P_(i+0), P_(i+1), P_(i+2), and P_(i+3).

The logic one values at the first and second inputs of adder 90 causeoutput S to supply a logic zero value. Further, adder 90 generates acarry signal at output CO. Adder 92 receives a logic one value at thefirst input, a logic one value at the second input, and a logic zerovalue for the carry signal at input CI (AND-gate 90A blocks the carrysignal generated by adder 90 from propagating to adder 92). The signalat output S of adder 92 has a logic zero value and the carry signal atoutput CO has a logic one value. It should be noted that AND-gate 92Ablocks the carry signal generated by adder 92 from propagating to adder94.

Adder 94 receives a logic one at the first input, a logic zero at thesecond input from AND-gate 93, and a logic zero for the carry signal.The output S of adder 94 has a logic one value and the carryout signalhas a logic zero value at the carry output CO. Likewise, adder 96receives a logic one at the first input, a logic one at the second inputfrom AND-gate 95, and a logic zero value for the carry signal. Theoutput signal at output S of adder 96 has a logic zero value and thecarry signal at the carry output CO has a logic one value. In accordancewith the Foster-Montgomery Reduction Algorithm, the particular bitlocation having a logic one value, i.e., the least significant bitlocation at input terminal 80, causes the value N to be aligned andadded to the value P.

According to the Foster-Montgomery Reduction Algorithm, the datagenerated by the adders in column X₁ have values that depend on the dataat a particular data bit location. The particular data bit location inthis instance corresponds with the output S of adder 92. It should benoted that an input of AND-gate 134 receives a logic zero value from thesignal at output S of adder 92. Latch 132 is not set and the Q output oflatch 132 remains a logic zero value. AND-gates 99, 101, 103, and 105generate logic zero values at the second inputs of adders 100, 102, 104,and 106, respectively. Adder 100 has logic zero values at both the firstand second inputs and generates a logic zero value at output S.Likewise, adder 102 has logic zero values at both the first and secondinputs and generates a logic zero value at output S. Adder 104 has alogic one value at the first input and a logic zero value at the secondinput and generates a logic one value at output S. Adder 106 has logiczero values at both the first and second inputs and generates a logiczero value at output S. Thus, adders 106, 104, 102, and 100 in column X₁generate a respective value of 0100.

The data generated by the adders in column X₂ have values that alsodepend on the data at a particular data bit location. The particulardata bit location in this instance corresponds with the output S ofadder 104. It should be noted that an input of AND-gate 138 receives alogic one value from the signal at output S of adder 104. The logic onevalue at the output of AND-gate 138 causes latch 136 to set and the Qoutput of latch 136 to have a logic one value. AND-gates 109, 111, 113,and 115 are enabled by the logic one value generated by latch 136. Thus,the data at the outputs of adders 100, 102, 104, and 106 is transferredto the second inputs of adders 110, 112, 114, and 116, respectively.Adder 110 has logic zero values at both the first and second inputs andgenerates a logic zero value at output S. Likewise, adder 112 has logiczero values at both the first and second inputs and generates a logiczero value at output S. Adder 114 has logic one values at both the firstand second inputs and generates a logic zero value at output S and alogic one value for the carryout signal at output CO. The logic onevalue for the carryout signal is inhibited by AND-gate 114A frompropagating to adder 116. Adder 116 has logic zero value at the firstinput, a logic one value at the second input, and a logic zero value forthe carry input. A logic one value is generated at output S of adder116. Thus, adders 116, 114, 112, and 110 in column X₂ generate arespective value of 1000.

The data generated by adders 120, 122, 124, and 126 in column X₃ havevalues that also depend on the data at a particular data bit location.The particular data bit location in this instance corresponds with theoutput S of adder 116. An input of AND-gate 142 receives a logic onevalue from the signal at output S of adder 116. And gate 142 having alogic one value from adder 116 and a logic one value for the signal Tcauses latch 140 to set. The Q output of latch 140 has a logic one valuewhich enables AND-gates 119, 121, 123, and 125. The data at the outputsof adders 110, 112, 114, and 116 is transferred to the first inputs ofadders 120, 122, 124, and 126, respectively. Adder 120 has logic zerovalues at both the first and second inputs and generates a logic zerovalue at output S. Likewise, adder 122 has logic zero values at both thefirst and second inputs and generates a logic zero value at output S.Adder 124 also has logic zero values at both the first and second inputsand generates a logic zero value at output S. Adder 126 has logic onevalues at both the first and second inputs and generates a logic zerovalue at output S and a logic one value as the carryout signal at thecarry output. AND-gate 126A inhibits the carryout signal frompropagating to a latch 162. Thus, adders 126, 124, 122, and 120 incolumn X₃ generate a respective value of 0000 at output terminals164-170.

During the reduction process that occurs in the first multiplicationcycle, the first N bits of the partial product of digits A₀ and B₀ arereduced to having values of zero. Latches 128, 132, 136, and 140 havebeen set and contain the value for μ of 1101 that is used in subsequentpipelined multiplications for determining the product of operands A andB. Following the reduction of the first partial product to zero, thesignal T transitions from a logic one to a logic zero value and storesthe value of μ in latches 128, 132, 136, and 140. The stored value of μ,a value for N_((i+3)), N_((i+2)), N_((i+1)), and N_((i+0)) of 0000, anda value for P_((i+3)), P_((i+2)), P_((i+1)), and P_((i+0)) of 0000 areused by multiplier structure 171 to complete the polynomial reductionprocess. The signals at output terminals 170, 168, 166, and 164 have arespective value of 0111, e.g. a value represented as (x²+x+1) inpolynomial form, after the second multiplication cycle has completed.

Briefly referring to FIG. 4, the modular polynomial multiplication of(A*R mod N) and (B*R mod N) produces the same binary product as foundusing the circuitry of FIG. 3. When calculating modular polynomial-basismultiplications, AND-gates 90A-96A, 100A-106A, 110A-116A, and 120A-126Aare not enabled. Therefore, adders 90-96, adders 100-106, adders110-116, and adders 120-126 do not propagate a carryin signal toadjacent adder cells. The disabled AND-gates cause a logic zero value tobe supplied at each of the CI terminals.

During the first multiplication cycle, the reduction process causes avalue of 0000 to be generated as the first partial product of digits A₀and B₀ at output terminals 170, 168, 166, and 164. In addition, latches224, 220, 216, and 212 are set during the generation of the firstpartial product and the latches retain the value for μ of 1101 that isused in subsequent pipelined multiplications. During the secondmultiplication cycle, the signals generated at output terminals 170,168, 166, and 164 have a respective binary value of 0111 or a value of(x²+x+1) in polynomial form.

It should be noted that the architecture of multiplier structure 171allows the value of μ to be determined and stored in latches 212, 216,220, and 224. In other words, the value of μ is not calculated prior tothe multiplication of the operands A and B, but rather-the value of μ isdetermined and latched during the cycle that determines themultiplication of the digits A₀ and B₀. The latched value of μ is usedduring the multiplication of the other digits in the pipelined processthat determine the full product of the operands A and B.

FIG. 5 is a schematic diagram showing a portion of a multiplier 232 forcomputing modular polynomial-basis multiplications. Briefly referring toFIG. 4, AND-gates 90A-96A, 100A-106A, 110A-116A, and 120A-126A are notenabled when multiplier structure 171 is used for computing modularpolynomial-basis multiplications. Therefore, adder cells do not receivea carryin signal from the carryout (CO) terminal of an adjacent addercell. Accordingly, the full adder cell of adders 90-96, 100-106,110-116, and 120-126 can be replaced by a half adder cell as illustratedin FIG. 5. The letter “H” has been appended to the reference number ofthe exclusive-OR gates used as the half adder cells.

For the example where (A*R mod N)=(x⁶+x⁴) mod N, (B*R mod N)=(x⁶) mod N,A=(x²+1), B=(x²), R=(x⁴), and N=(x³+x+1), the polynomial multiplicationof (A*R mod N) and (B*R mod N) produces a value of 0000 at therespective output terminals 170, 168, 166, and 164 during the firstmultiplication cycle. Thus, the first partial product is reduced to zeroand the value of μ is determined as having a value of 1101 and stored inrespective latches 224, 220, 216, and 212. The stored value of μ is usedduring subsequent multiplication cycles that generate the full productof operands A and B. The signals at output terminals 170, 168, 166, and164 have a respective binary value of 0111, e.g., a value of (x²+x+1) inpolynomial form during the second multiplication cycle.

FIG. 6 is a block diagram that illustrates a 1×M multiplier 240 forcomputing either integer-modulo-N multiplications or modularpolynomial-basis multiplications, where M is the number of multipliercells. Multiplier 240 has a B-register 242 for storing operand B, anA-register 244 for storing operand A, a C-register 246 for computing andstoring a product value, and an N-register 248 for storing a value of N.Although a reset line is not shown, C-register 246 is initially clearedprior to the first multiplication cycle. It should be noted thatN-register 248 stores a binary value having an odd integer value whenmultiplier 240 computes integer-modulo-N multiplications and-a binaryvalue for an irreducible polynomial when multiplier 240 computes modularpolynomial-basis multiplications. Registers 242-248 are illustrated inFIG. 6 as M-bit wide registers.

B-register 242, in the preferred embodiment, is a shift register thatshifts the data stored in that register either to the left or to theright. By way of example, B-register 242 shifts data to the right whenmultiplier 240 computes integer-modulo-N multiplications, i.e.,data-bits of B-register 242 are transferred to mux 250 starting with theleast-significant data-bits of B-register 242. On the other hand,B-register 242 shifts data to the left when multiplier 240 computesmodular polynomial-basis multiplications, i.e., data-bits of B-register242 are transferred to mux 250 starting with the most-significantdata-bits of B-register 242. The clock signals used to latch values inB-register 242, A-register 244, C-register 246, and N-register 248 arenot shown in FIG. 6. Also, the bus lines connected to inputs and outputsof each register that allow data to be transferred to and retrieved fromthe registers are not shown.

Multiplier 240 computes either integer-modulo-N multiplications ormodular polynomial-basis multiplications based on the logic state of thesignal at the INT/POLY input. The INT/POLY input is connected to theselect input of a multiplexer (mux) 250, to B-register 242, and to aninput of the adder cells of C-register 246 (see input INT/POLY in FIGS.7 and 8). Thus, when the signal at the INT/POLY input causes multiplier240 to compute modular polynomial-basis multiplications, B-register 242operates to shift data to the left, presenting the data from the mostsignificant data-bit position of B-register 242 through mux 250 toinputs of C-register 246. When multiplier 240 computes integer-modulo-Nmultiplications, B-register 242 operates to shift data to the right,presenting the data from the least significant data-bit position ofB-register 242 through mux 250 to inputs of C-register 246.

FIG. 7 is a schematic diagram of a cell 270 that is used in C-register246 of multiplier 240 (FIG. 6) for single-cycle multiplicationoperations. Although multiplier-240 is illustrated as a ripple-carrymultiplier, it should be understood that multiplier 240 could beimplemented as a carry-save multiplier. Thus, cells C_((n−1)),C_((n−2)), . . . , and C₀ of C-register 246 incorporate cell 270 incomputing modular polynomial-basis and integer-modulo-N multiplications.A logic zero at the input INT/POLY of cell 270 causes cell 270 tocompute the modular polynomial-basis. Latch 262 in cell 270 latches the“ith” bit, storing the value(A_(i)*B_(i)⊕N_(i)*C_(HIGH)⊕CARRYIN_((i−1))), where A_(i), B_(i), andN_(i) are values stored at a particular bit location (designated as bitlocation i) of A-register 244, B-register 242, and N-register 248,respectively. C_(HIGH) is the value of the most significant data bitthat is stored in C-register 246. C_((i−1)) is the previous partialproduct value that is stored in the register cell that is adjacent tothe “ith” bit in C-register 246.

On the other hand, when multiplier 240 is selected for computinginteger-modulo-N multiplications, latch 262 of cell 270 latches thevalue of (A_(i)*B_(i)⊕CARRYIN_((i−1))⊕CARRYIN_((i−2))⊕C_((i−)₁₎⊕N_(i)*C_(LOW)), where A_(i), B_(i), and N_(i) are values stored atthe “ith” bit location of A-register 244, B-register 242, and N-register248, respectively. C_(LOW) is the value of the least significant databit that is stored in C-register 246. CARRYIN_((i−1)) is the carrysignal that propagates from the adder cell that is adjacent to the “ith”bit in C-register 246. CARRYIN_((i−2)) is the carry signal propagatedfrom an adder cell that is two cells removed from the “ith” bit inC-register 246. C_((i−1)) is a previous partial product value that isstored in a latch that is adjacent to the “ith” bit in C-register 246.

In operation, the multiplication of operand A by operand B in integerform for integer-modulo-N multiplications is accomplished in multiplemultiplication cycles. Data is shifted from B-register 242, one data biteach multiplication cycle, to C-register 246. Thus, C-register 246performs the multiplication of operands A and B and reduces that productby multiples of N to generate the value (A*B*R⁻¹ mod N). Thus, in thefirst multiplication cycle, the least significant data bit of operand Bis shifted through mux 250 to C-register 246. In the next multiplicationcycle, the shift right operation of B-register 242 causes the next leastsignificant data bit to be transferred through mux 250 to C-register246. The multiplication process continues until B-register 242 hasshifted the stored value of operand B though mux 250, one data bit permultiplication cycle, to C-register 246 and C-register 246 generates theproduct (A*B*R⁻¹ mod N).

It should be noted that the multiplication of operand A, having the form(A*R mod N), with operand B, also having the form (B*R mod N), generatesthe product (A*B*R mod N) in reduced form. In other words, the productis reduced by R. By way of example, the (A*R mod N) term having a valueof 10110 is stored in A-register 244, the (B*R mod N) term having avalue of 10101 is stored in B-register 242, and the N term having avalue of 11101 is stored in N-register 248. Initially, C-register 246 iscleared, causing the previous partial product C_((i−1)) to have a valueof zero. In this example, multiplier 240 generates the product (A*B*Rmod N) having the value (1001).

Specifically, the first partial product is generated by multiplying thevalue stored in A-register 244 by the least significant data bit fromB-register 242. Thus, A-register 244 has a value (10110) that ismultiplied by B(0), i.e., the least significant bit of B and a logic onevalue (10101).

(1) 10110 <== value stored in A-register 244 (2) × 10101 <== B(0), leastsignificant bit of B (3) 10110 <== first bit multiply

Using the Foster-Montgomery Reduction Algorithm, the logic value of thedata in a particular bit location of the partial product determineswhether the value of N should be aligned and added to the partialproduct to reduce the value of the partial product for mod N. When theparticular bit location has a logic zero value, then the value of N isnot added to the partial product. On the other hand, the value of N isadded to the partial product when the particular bit location has alogic one value. In this example, the particular bit location is theleast significant bit of the first bit multiply (10110). A logic zerovalue is in this location and accordingly, the value of N is not addedto the first bit multiply (3).

The second bit multiply involves the multiplication of the value storedin A-register 244 by the next least significant bit from B-register 242.Thus, the value in A-register 244 (10110) is multiplied by B(1), i.e.,the next least significant data bit of B and a logic zero value (10101).

(1) 10110 <== value stored in A-register 244 (4) × 10101 <== B(1), nextleast significant bit (5) 00000 <== second bit multiply result

The product of the second bit multiply (5) is summed with the storedprevious result (3) to generate the second partial product (6).

(5) 00000 <== second bit multiply (3) + 10110 <== first partial product(6) 10110 <== second partial product

In the Foster-Montgomery Reduction Algorithm, the logic value of theparticular bit location of the second partial product determines whetherthe second partial product should be reduced. In this case, theparticular bit location is the location just to the left of the leastsignificant data bit (10110). The second data bit has a logic one valueand accordingly, the value of N is aligned and added to the secondpartial product. In other words, the second partial product is reducedby the addition of N aligned at the particular bit location.

(6) 10110 <== second partial product (7) + 11101  <== aligned value of N(8) 1010000 <== reduced second partial product

The third bit multiply involves the multiplication of the value storedin A-register 244 by the logic value of B(2), i.e., the value of thedata bit located in the third bit location (10101) from the right inB-register 242.

 (1) 10110 <== value stored in A-register 244 (9) × 10101 <== B(2), nextleast significant bit (10) 10110 <== third bit multiply result

Following the third bit multiply, the product of the third bit multiply(10) is added to the previous result (8) to provide the third partialproduct (11).

 (8) 1010000 <== previous result (10) + 10110  <== third bit multiply(11) 10101000 <== third partial product

The logic value of the particular bit location of the third partialproduct determines whether the third partial product should be reduced.In this example, the particular bit location is the third bit locationfrom the right (10101000). The third data bit has a logic zero value andaccordingly, the value of N is not aligned and added to the thirdpartial product.

The fourth bit multiply involves the multiplication of the value storedin A-register 244 by the logic value of B(3), i.e., the value of thedata bit located in the fourth bit location (10101) from the right inB-register 242.

(1) 10110 <== value stored in A-register 244 (12) × 10101 <== B(3), nextleast significant bit (13) 00000 <== fourth bit multiply result

Following the fourth bit multiply, the fourth bit multiply result isadded to the third partial product (11) to provide the fourth partialproduct (14).

(11) 10101000 <== third partial product (13) + 00000 <== fourth bitmultiply result (14) 10101000 <== fourth partial product

The logic value of the particular bit location of the fourth partialproduct determines whether the fourth partial product should be reduced.In this example, the particular bit location is the fourth bit locationfrom the right (10101000). The fourth data bit has a logic one value andaccordingly, the value of N is aligned and added to the fourth partialproduct.

(14) 10101000 <== fourth partial product (15) + 11101   <== alignedvalue of N (16) 110010000 <== reduced fourth partial product

The fifth bit multiply involves the multiplication of the value storedin A-register 244 by the logic value of B(4), i.e., the value of thedata bit located in the fifth bit location (10101) from the right inB-register 242.

(1) 10110 <== value stored in A-register 244 (17) × 10101 <== B(4), nextleast significant bit (18) 10110 <== fifth bit multiply result

Following the fifth bit multiply, the fifth bit multiply result is addedto the reduced fourth partial product (16) to provide the fifth partialproduct (19).

(16) 110010000 <== reduced fourth partial product (18) + 10110   <==fifth bit multiply result (19) 1011110000 <== fifth partial product

Again, the logic value of the particular bit location of the fifthpartial product determines whether the fifth partial product should bereduced. In this example, the particular bit location is the fifth bitlocation from the right (1011110000). The fifth data bit has a logic onevalue and accordingly, the value of N is aligned and added to the fifthpartial product.

(19) 1011110000 <== fifth partial product (20) + 11101   <== the valueof N properly aligned (21) 10011000000 <== reduced fifth partial product

The product of (A*R mod N) and (B*R mod N), i.e., (10110) and (10101),has a value that is greater than the value of N. When the reduced finalpartial product has a value that is greater than N, then the value of Nis subtracted from that final partial product. In other words, the valueof N (11101) is aligned and subtracted from the reduced partial product(10011000000). It should be noted that the 1×N multiplier 240 has beenused in computing the final product (A*B*R mod N) having a value of1001.

The value of μ in the Foster-Montgomery Reduction Algorithm is notcomputed prior to the multiplication of the operands A and B but, asnoted in the previous example, the value of μ is determined while theproduct of the digits A₀ and B₀ is being reduced. It should be notedthat the value for N is odd, i.e., the value of N has a logic one valuein the position for the least significant bit. Thus, by adding N to thesummed value when the logic value of the particular bit location has alogic one value, the value (A*B*R mod N) is generated having a number ofzeros in the lower bit locations. Put another way, the Foster-MontgomeryReduction Algorithm causes the least significant bit locations to havelogic zero values in generating a product that is reduced by the valueR.

Referring to FIGS. 6 and 7, the product (A*B) mod N can be generated tosupport ECC (F2^(M) in the polynomial-basis), where A and B are finitefield elements representing the coordinates of the elliptic curve and Nis the irreducible or basis polynomial. The number of multiplicationcycles required to generate the product depends, in part, on the numberof bits stored in B-register 242. Data is shifted from B-register 242,one data bit at a time, to C-register 246. Thus, C-register 246 performsthe multiplication of operands A and B and reduces that product bymultiples of N in generating the value A*B mod N. Since a carry signalis not propagated between adder cells when multiplier 240 is computingmodular polynomial-basis multiplications, the calculation of modularpolynomial-basis multiplications can begin by multiplying the mostsignificant data bit from A-register 244 with the most significant databit from B-register 242. This eliminates the necessity of putting theoperands into the Montgomery format, i.e., A→AR mod N. B-register 242shifts data bits, starting with the most significant data bits, throughmux 250 to C-register 246.

The multiplication of the value stored in A-register 244 by the mostsignificant data bit stored in B-register 242, i.e., the value B(4),generates the first partial product. Thus, by way of example, A-register244 has a binary value 10110 (x⁴+x²+x, in polynomial form) that ismultiplied by B(4), i.e., a binary one value 11101 (x⁴, in polynomialform). The irreducible polynomial N has a value of 100101 (x⁵+x²+1, inpolynomial form).

(1) 10110 <== value stored in A-register 244 (2) × 11101 <== B(4), mostsignificant bit (3) 10110 <== first partial product result

The first partial product is added to a previous partial product,initially having a value of zero based on a reset of C-register 246,providing a summed value of 10110. In the next multiplication cycle, thedata in B-register 242 is shifted to the left and the next mostsignificant data bit of B-register 242 is transferred through mux 250 toC-register 246. C-register 246 multiplies the value stored in A-register244 by the next most significant data bit. Thus, the binary value 10110(x⁴+x²+x, in polynomial form) is multiplied by B(3), i.e., a binary onevalue 11101 (x³, in polynomial form).

(1) 10110 <== value stored in A-register 244 (4) × 11101 <== B(3), nextleast significant bit (5) 10110 <== second bit multiply result

The second bit multiply result (5) is summed with the stored previousresult to generate the second partial product (6).

(3) 10110  <== first partial product (5) +  10110 <== second bitmultiply result (6) 111010 <== second partial product

The logic value of a particular bit location is tested to determinewhether the partial product should be reduced. When the value of thedata bit at the particular bit location has a logic one value, the valueof N is aligned to that particular bit location and added to the partialproduct. In this case, the particular bit location is the mostsignificant data bit location of the generated second partial product.The value of the data bit at the particular bit location has a logic onevalue (111010). Therefore, the value of N is aligned (x³*N) andsubtracted from the most significant data bit location.

It should be noted that when computing modular polynomial-basismultiplications, multiplier 240 does not propagate a carry signal and,therefore, the operation of “adding” or “subtracting” is an exclusive-ORof the two values. It should be further noted that the most significantdata location of the second partial product is reduced to a zero valueby the addition of N.

(6) 111010 <== second partial product (7) − 100101 <== aligned value ofN (x⁸ + x⁵ + x³) (8) 011111 <== reduced second partial product (x⁷ +x⁶ + x⁵ + x⁴ + x³)

The third bit multiply involves the multiplication of the value storedin A-register 244 by the logic value of B(2), i.e., the value of thedata bit located in the third bit location (11101) from the left inB-register 242.

 (1) 10110 <== value stored in A-register 244  (9) × 11101 <== B(2),next most significant bit (10) 10110 <== third bit multiply result

Following the third bit multiply, the product of the third bit multiply(10) is added to the previous result, i.e., the reduced second partialproduct (8), to provide the third partial product (11).

 (8) 011111  <== reduced second partial product (10) + 10110 <== thirdbit multiply (x⁶ + x⁴ + x³) (11) 0101000 <== third partial product (x⁷ +x⁵)

The logic value of the particular bit location of the third partialproduct determines whether the third partial product should be reduced.In this example, the particular bit location is the second bit locationfrom the left (0101000). The second data bit has a logic one value andaccordingly, the value of N is aligned (x²*N) and subtracted from thethird partial product.

(11) 0101000 <== third partial product (x⁷ + x⁵) (12) −  100101 <==aligned value of N (x⁷ + x⁴ + x²) (13) 0001101 <== reduced third partialproduct (x⁵ + x⁴ + x²)

The fourth bit multiply involves the multiplication of the value storedin A-register 244 by the logic value of B(1), i.e., the value of thedata bit located in the fourth bit location (11101) from the left inB-register 242.

 (1) 10110 <== value stored in A-register 244 (14) × 11101 <== B(1),next most significant bit (15) 00000 <== fourth bit multiply result

Following the fourth bit multiply, the fourth bit multiply result (15)is added to the reduced third partial product (13) to provide the fourthpartial product (16).

(13) 0001101  <== reduced third partial product (15) + 00000 <== fourthbit multiply result (16) 00011010 <== fourth partial product (x⁵ + x⁴ +x²)

The logic value of the particular bit location of the fourth partialproduct determines whether the fourth partial product should be reduced.In this example, the particular bit location is the third bit locationfrom the left (00011010). The third data bit has a logic zero value andaccordingly, the value of N is not added to the fourth partial product.

The fifth bit multiply involves the multiplication of the value storedin A-register 244 by the logic value of B(0), i.e., the value of thedata bit located in the fifth bit location (11101) from the left inB-register 242.

(1) 10110 <== value stored in A-register 244 (17) × 11101 <== B(0), nextmost significant bit (18) 10110 <== fifth bit multiply result

Following the fifth bit multiply, the fifth bit multiply result (18) isadded to the reduced fourth partial product (16) to provide the fifthpartial product (19).

(16) 00011010  <== reduced fourth partial product (18) + 10110 <== fifthbit multiply result (19) 000100010 <== fifth partial product (x⁵ + x)

The logic value of the particular bit location of the fifth partialproduct determines whether the fifth partial product should be reduced.In this example, the particular bit location is the fourth bit locationfrom the left (000100010). The fourth data bit has a logic one value andaccordingly, the value of N is aligned and subtracted from the fifthpartial product.

(19) 000100010 <== fifth partial product (20) − 100101 <== the value ofN properly aligned (21) 000000111 <== reduced fifth partial product(x² + x + 1)

The multiplication process continues until B-register 242 has shiftedthe stored value of operand B though mux 250, one data bit permultiplication cycle, to C-register 246 and C-register 246 has generatedthe product (A*B mod N). The (A mod N) term, having a binary value of10110 (x⁴+x²+x¹, in polynomial form), is multiplied with the (B mod N)term, having a binary value of 11101 (x⁴+x³+x²+1, in polynomial form) togenerate the binary value of 000000111 (x²+x+1, in polynomial form).

FIG. 8 is a schematic diagram of another cell that can be used in allbit locations of C-register 246 of multiplier 240 (FIG. 6) for two-cyclemultiplication operations. Referring to FIG. 6, cell 280 (FIG. 8)describes the logic for cells C_((n−1)), C_((n−2)), . . . and C₀ ofC-register 246. A logic zero at input INT/POLY of multiplier 240 selectsthe multiplier for computing modular polynomial-basis multiplications.Referring to FIG. 8, a latch in cell 280 latches the value(A_(i)*B_(i)⊕N_(i)*C_(HIGH)⊕C₍ _(i−1))), where A_(i), B_(i), and N_(i)are values stored at a particular bit location (designated as bitlocation i) of A-register 244, B-register 242, and N-register 248,respectively. C_(HIGH) is the value of the most significant data bitthat is stored in C-register 246. C_((i−1)) is the previous partialproduct from an adder cell that is located adjacent to the “ith” cell inC-register 246.

On the other hand, when multiplier 240 (FIG. 6) is selected forcomputing integer-modulo-N multiplications, cell 280 latches the value(A_(i)*B_(i)⊕CARRYIN0 _((i−1))⊕C_((i−1))), where A_(i) and B_(i) arevalues stored at a particular bit location (designated as bit locationi) of A-register 244 and B-register 242, respectively. CARRYIN0 _((i−1))is the carry signal that propagates from the adder cell that is locatedadjacent to the “ith” cell in C-register 246. C_((i−1)) is a previouspartial product value that is stored in the adder cell that is locatedadjacent to the “ith” cell in C-register 246.

If the least significant data bit (LSB) that is latched in C-register246 (FIG. 6) has a logic one value, then a second multiplication cycleis used to determine C_(i)⊕N_(i)⊕CARRYIN0 ⁽¹⁻¹⁾ and cause a reduction ofthe generated partial product. This is indicated by the REDUCED inputsignal having a logic one value. N_(i) is a value stored at a particularbit location (designated as bit location i) of N-register 248. Thus, thefirst multiplication cycle computes the partial product of A_(i)*B_(i),and depending on the calculated partial product, the secondmultiplication cycle reduces the partial product. A feedback pathprovides the value of C_(i) to mux 282 and a conduction path providesthe value of N_(i) through mux 284 to inputs of full adder 286 duringthe second multiplication cycle. On average, about 50 percent of thetime the second multiplication cycle is needed in generating the reducedproduct (A*B*R⁻¹ mod N).

By now it should be appreciated that the present invention provides acryptographic multiplication system that achieves high performance, lowcost, and low-power for implementation in an integrated circuit. Thehardware multiplier achieves high performance by computing a product oftwo operands to support the RSA and ECC algorithm. The multiplicationsystem is adaptable to large operands and performs calculations in fewerclock cycles than in prior art systems.

What is claimed is:
 1. A multiplier having a plurality of interconnectedmultiplier cells, wherein a first one of the multiplier cells comprises:a first adder having a data input coupled for receiving a first datasignal, a second input, and an output that supplies a data outputsignal; and a blocking circuit having an input coupled for receiving acarryin signal, an output coupled to the second input of the firstadder, and a control input coupled for receiving a select signal,wherein the select signal configures the first one of the multipliercells to operate in one of an integer-based multiplication mode and apolynomial-based multiplication mode.
 2. The multiplier of claim 1,wherein a first value of the select signal passes the carryin signal tothe second input of the first adder and a second value of the selectsignal blocks the carryin signal from the second input of the firstadder.
 3. The multiplier of claim 1, wherein the blocking circuitincludes a first logic gate having a first input coupled for receivingthe carryin signal, a second input coupled for receiving the selectsignal, and an output coupled to the second input of the first adder. 4.The multiplier of claim 3, wherein the first logic gate includes anAND-gate.
 5. The multiplier of claim 1, wherein the blocking circuitincludes a multiplexer having a first input for receiving the carryinsignal, a second input for receiving a logic signal, a control inputthat receives the select signal, and an output that is coupled to thesecond input of the first adder.
 6. The multiplier of claim 1, whereinthe multiplier cell further includes: a second adder having first andsecond inputs coupled for receiving respective second and third datasignals; a third adder having a first input coupled to a first output ofthe first adder and a second input coupled to a first output of thesecond adder; and a fourth adder having a first input coupled to a firstoutput of the third adder, a second input coupled to a second output ofthe first adder, and a third input coupled o a second output of thesecond adder.
 7. The multiplier of claim 6, wherein the first and fourthadders are full adders.
 8. The multiplier of claim 6, wherein the secondand t adders are half adders.
 9. The multiplier of claim 6, wherein themultiplier cell further includes a latch having an input coupled to anoutput of the third adder and an output at supplies a stored productvalue generated by the multiplier cell.
 10. The multiplier of claim 6,wherein the fourth adder has an output that provides a carryout signal.11. The multiplier of claim 1, wherein the first value selects themultiplier to compute an integer-modulo-N multiplication and the secondvalue selects the multiplier to compute a modular polynomial-basismultiplication.
 12. A method of performing modulo arithmetic in amultiplier, comprising the steps of: receiving data values during afirst multiplication cycle; blocking the data values during a secondmultiplication cycle; receiving a modulus value N during a secondmultiplication cycle; and reducing a product of the data values by themodulus value N during a second multiplication cycle.
 13. The method ofclaim 12, wherein the modulus value N is an odd integer value.
 14. Themethod of claim 12, wherein the modulus value N is an irreduciblepolynomial value.
 15. A method of performing an arithmetic operation ina multiplier, comprising the steps of: adding a data signal with acarryin signal in an adder to provide an output signal; blocking thecarryin signal from the adder when the select signal has a second value;feeding the output signal as an input of the adder during a secondmultiplication cycle; receiving a modulus value N during the secondmultiplication cycle; and reducing a value of the output signal by themodulus value N during the second multiplication cycle.
 16. The methodof claim 15, further including the step of passing the carryin signal tothe adder when a select signal has a first value.
 17. The method ofclaim 15, further including the step of: adding a second carryin signalin generating the output signal; passing the second carryin signal tothe adder when a select signal has a first value; and blocking thesecond carryin signal from the adder when the select signal has a secondvalue.